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ABSTRACT 


This  thesis  introduces  exploratory  data  analysis  methods 
into  the  question  of  categorizing  pilots  and  relating  these 
categories  to  accident  potential.  The  usually  recorded 
flight  data  deals  with  the  pilots*  total  flight  experience, 
recency,  and  frequency  of  flying.  The  purpose  of  categorizing 
is  to  determine  if  the  recorded  flight  data  could  help  dis- 
criminate between  two  original  sample  groups  of  fifty  pilots 
each,  those  pilots  with  accidents  during  FY73  and  those 
without . 

The  technique  of  linear  discriminant  analysis  indicated 
that  there  is  a significant  difference  in  the  mean  vectors 
of  flight  data  for  the  two  groups.  The  computed  discriminant 
function  produced  an  empirical  correct  classification  rate 
of  8lJ£.  Techniques  of  cluster  analysis  (with  the  aid  of 
principal  components  analysis)  are  also  employed  to  detect 
patterns  or  differences  in  tne  data.  Curiously,  the  amount 
of  time  flown  in  the  last  hours  is  associated  with  rela- 
tively low’  accident  potential,  whereas  time  flown  in  the 
last  2*i  hours  seems  to  be  correlated  with  a higher  accident 
potential. 
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I.  INTRODUCTION  AND  OBJECTIVES 

Aviation  safety  in  the  United  States  Navy  has  always 
received  considerable  attention.  With  the  rapidly  increas- 
ing costs  of  naval  aircraft  and  the  increasing  costs  of 
training  naval  aviators,  it  is  imperative  that  every  possi- 
ble aspect  of  aviation  safety  be  thoroughly  investigated. 

It  is  important  to  search  all  paths  which  may  yield  any 
information  at  all  having  a bearing  on  aircraft  accident 
causation  or  prevention. 

Over  the  last  five  years,  approximately  fifty  per  cent 
of  the  major  and  minor  aircraft  accidents  in  the  Navy  have 
included  pilot  error  as  either  the  primary  factor  involved 
or  as  a contributing  factor  to  the  cause  of  the  accident. 

There  have  been  many  reasons  purported  as  to  the  causes 
of  pilot  error  accidents,  ranging  anywhere  from  plain  lack 
of  physical  coordination  to  mental  incompetence.  A general 
term  which  relates  to  both  physical  and  mental  abilities  is 
experience.  That  is,  as  flying  experience  increases,  the 
learning  process  should ' increase  both  of  these  abilities. 
Another  general  term  which  affects  these  two  abilities  is 
proficiency.  That  is,  recency  and  frequency  of  flying 
j should  also  have  a direct  bearing  on  these  abilities. 

I 

= This  thesis  explores  methods  for  classifying  or  cate- 

t 

? 

} gor icing  pilots  according  to  variables  associated  with  their 

experience  and  proficiency.  Accident  records  are  used  to 
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determine  if  there  is  any  relation  between  the  classifi- 
cations and  the  occurrence  of  accidents. 

One  would  like  to  know  if  by  investigating  a pilot’s 
experience  and  proficiency  data  whether  or  not  he  shows  a 
high  or  low  accident  potential.  Of  specific  interest  is  the 
question  of  whether  pilot  error  accidents  are  related  to 
lack  of  total  flying  experience,  lack  of  experience  in  type 
of  aircraft,  or  lack  of  practice  due  to  insufficient  current 
flying.  If  an  individual  were  classified  as  having  a high 
degree  of  accident  potential,  then  corrective  action  could 
be  taken  to  x'educe  this  potential. 

Only  the  pilots  of  Navy  fixed-wing  aircraft  are  studied. 
Marino  and/or  helicopter  pilots  are  not  included.  The  study 
encompasses  those  accidents  that  occurred  during  fiscal  year 
1973.  Unfortunately,  the  data  base  contains  the  records  of 
only  fifty  aviators  who  have  been  involved  in  pilot  error 
accidents.  Fifty  other  pilots  were  selected  as  a control. 
Even  with  these  small  numbers,  a result  appeared  that  may  be 
worth  pursuing  further,  Keceney  of  flying  may  be  overdone. 
The  amount  of  time  flown  in  the  last  8 hours  is  positively 
correlated  with  low  accident  potential,  but  a reversal  seems 
to  take  place  when  looking  at  the  time  flown  in  the  last 
2^  hours. 
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II.  FACTORS  INFLUENCING  EXPERIENCE  AND  PROFICIENCY 


There  are  many  factors  affecting  experience  and  profi- 
ciency. Situations  encountered,  crises  faced,  types  of 
missions  flown,  and  many  other  qualitative  factors  have  a 
definite  bearing.  However,  the  only  factors  considered  here 
are  quantitative  variables  which  can  be  obtained  from  acci- 
dent records  and  IFARS  (Individual  Flight  Activity  Reporting 
System)  pilot  records.- 

The  Naval  Safety  Center  at  Norfolk,  Virginia  naintains 

records  of  all  accidents  in  which  Naval  aircraft  are  involved. 

The  recorded  data  items  which  reflect  a pilot's  total 

experience  are  the  following: 

Number  of  years  designated  a naval  aviator 
Total  flying  hours 

Total  flying  hours  in  the  model  aircraft  In  which 
the  accident  occurred 

Total  day  carrier  landings 

Total  night  carrier  landings 

The  data  Items  which  reflect  his  proficiency  (i.e.  his 

recency  and  frequency  of  flying)  are  the  following: 

Time  all  series  this  aircraft  In  last  90  days 
Time  this  model  this  aircraft  in  last  90  days 
Elapsed  time  since  last  previous  flight 
Time  flown  in  the  last  2^J  ho:rs 
Time  flown  irk  the  last  ^8  hours 
Number  of  missions  flown  in  the  last  hours 
Number  of  missions  flown  in  the  last  ^8  hours 
Number  day  carrier  landings  in  last  30  days 
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Number  night  carrier  landings  in  last  30  days 
Instrument  trainer  time  in  last  90  days 
Weapons  system  trainer  time  in  last  90  days 

The  Individual  Plight  Activity  Reporting  System  (IFARS), 

a part  of  the  Naval  Safety  Center,  maintains  flight  records 

on  all  naval  aviators  by  fiscal  year.  The  only  data  items 

pertaining  to  pilot  experience  which  are  retrievable  from 

computer  access  for  all  fiscal  years  are: 

Number  of  years  designated  a naval  aviator 
Total  flying  hours 

At  present,  these  following  additional  experience  items  are 
retrievable  by  computer  only  from  the  beginning  of  fiscal 
year  1969  and  thus  cannot  be  used  as  compax'ison  variables 
since  many  of  the  aviators  in  both  sample  groups  began 
flying  prior  to  1969. 

Total  time  by  model 

Day  and  night  carrier  landings  by  model 
Other  type  landings  by  model 
Instrument  time  by  model 

A new  compilation  Is  now  in  progress  by  the  IFARS  sec- 
tion at  the  Naval  Safety  Center  to  record  all  flights  on 
computer  files  for  all  fiscal  years  for  all  pilots  so  that 
future  studies  can  be  more  er.com-  i ssing. 

The  proficiency  indicator  data  items  for  those  pilots 
in  the  accident  group  have  a natural  base  point  from  which 
to  be  measured.  That  is,  an  item  such  as  *time  flown  in  the 
last  48  hours"  means  the  last  48  hours  directly  prior  to  the 
accident  in  which  the  pilot  was  Involved.  However,  for 
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the  non-accident  (control)  group,  there  is  no  such  reference 
point  from  which  to  measure.  Thus,  comparison  of  proficiency 
data  items  becomes  rather  nebulous. 

One  reasonable  way  to  give  significant  meaning  to  " >e 
term  proficiency  is  to  artificially  construct. simil  uta 
items  by  an  averaging  procedure.  For  example,  prior  to  each 
flight  (for  the  period  in  question)  compute  the  lime  flown 
in  the  preceding  48  hours.  Do  this  for  every  flight  during 
the  fiscal  year  and  then  obtain  an  average  time  flown  in 
the  preceding  48  hours.  The  necessary  data  can  be  obtained 
from  a detailed  flight  listing  for  the  pilots  in  the  control 
group  for  FY73.  This  procedure  can  be  utilized  for  the 
following  data  items: 

Time  a'l  series  this  aircraft  last  90  days 
Elapsed  time  since  last  previous  flight 
Time  flown  in  the  last  24  hours 
Time  flown  in  the  last  48  hours 
Number  of  missions  flown  in  the  last  24  hours 
Number  of  missions  flown  in  the  last  4S  hours 
Number  day  carrier  landings  in  last  30  days 
Number  night  carrier  landings  in  last  30  days 

With  these  artificially  constructed  data  items  one  can  in- 
clude proficiency  in  the  comparison  between  the  control 
group  and  the  accident  group.  The  appropriateness  of  doing 
this  can  be  determined  by  comparing  the  results  of  statisti- 
cal analyses  performed  with  and  without  these  added  variables 
If  these  added  variables  give  a better  delineation  between 
groups,  then  it  is  appropriate  to  include  them. 


To  recap,  the  variables  which  are  common  to  both  groups 

and  which  are  used  for  the  analysis  are : 

(X^)  Number  of  years  designated  a naval  aviator 
(Xg)  Total  flying  hours 

(Xg)  Time  all  series  this  aircraft  last  90  days 

(Xjj)  Time  since  last  previous  flight 

(Xfj)  Time  flown  in  the  last  24  hours 

(Xg)  Time  flown  in  the  last  48  hours 

(X^.)  Number  of  missions  flown  in  the  last  24  hours 

(Xa)  Number  of  missions  flown  in  the  last  48  hours 

o 

(X^)  Number  of  day  carrier  landings  in  last  30  days 
(X1Q)  Number  of  night  carrier  landings  in  last  30  days 
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III.  SELECTION  OP  GROUPS 


The  accident  group  was  composed  of  all  those  pilots 
who  were  involved  in  pilot  error  accidents  during  fiscal 
year  1,973.  This  group  comprised  66  different  pilots;  no 
pilot  had  more  than  one  accident  attributable  to  pilot  error. 
Due  to  incomplete  data  in  two  cases,  this  was  reduced  to 
64  pilots. 

The  control  group  was  more  difficult  to  establish  since 
there  were  several  thousand  aviators  from  which  to  choose. 

A subset  of  these  pilots  was  obtained  that  satisfied  two 
criteria:  (1)  it  appeared  to  be  a sample  representative  of 

all  naval  aviators,  and  (2)  the  data  was  relatively  easy  to 
obtain.  The  sample  taken  was  the  first  100  aviators  on  the 
IPARS  files.  Since  the  IPARS  files  are  ordered  by  increasing 
social  security  number  and  the  increments  between  successive 
numbers  was  very  large,  examination  of  the  typographical  data 
leads  us  to  believe  that  social  security  numbers  had  no 
bearing  upon  age,  length  of  time  in  aviation  duties,  or  even 
length  of  time  in  the  Naval  Service.  There  was  no  obvious 
reason  to  think  that  the  sample  was  unrepresentative. 

Prom  the  100  pilots  initially  assigned  to  the  control 
group,  20  were  helicopter  pilots  and  15  were  Naval  Plight 
Officers,  thus  leaving  65  subjects  in  the  control  group. 

Since  the  sise  of  the  two  groups  under  study  is  arbitrary, 
a further  reduction  in  the  size  of  each  group  was  made  to 
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meet  a computational  constraint  which  was  imposed  by  a 
computer  program  employed  in  the  actual  analysis.  Because 
of  the  extensive  computational  effort  required  in  the 
analytical  techniques  used,  the  use  of  a digital  computer 
was  mandatory.  One  of  the  computer  programs  used  for  the 
analysis  had  a limitation  of  100  data  units.  Therefore,  a 
random  selection  of  50  subjects  was  chosen  for  each  of  the 
two  groups  under  study.  (The  random  selection  was  accomplished 
in  the  manner  of  drawing  numbers  out  of  a hat.) 
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IV.  INVESTIGATIVE  APPROACHES 


The  data  describing  the  subjects  is  composed  of  ten 
pieces  of  information  for  each  subject.  This  constitutes 
a multivariate  data  set.  Therefore,  some  sort  of  multi- 
variate statistical  technique  is  appropriate.  Which  sta- 
tistical techniques  to  employ  depends  upon  the  information 
desired  to  be  obtained  from  the  analysis,  and  is  the  primary 
concern  of  this  section. 

As  stated  in  the  Introduction,  one  of  the  primary  objec- 
tives is  to  establish  a classification  scheme  and  then  to 
determine  if  this  classification  Is  related  to  the  occurrence 
of  acclder to.  One  statistical  procedure  which  treats  this 
j problem  is  that  ox  discriminant  analysis.  Discriminant 

analysis  is  a multivariate  statistical  technique  used  for 
constructing  decision  rules  by  which  data  units  (subjects, 
ox*  pilots  in  the  present  context)  can  be  classified  as 
member's  of  one  g;-oup  or  another.1  The  goal  is  to  assign 
subjects  to  the  groups  to  which  the>  have  the  greatest 

I resemblance  based  up'"A  a profile  of  their  characteristics, 

| while  at  the  same  time  to  minimise  the  effects  of  misclassl- 

| fication.2 


^Anderberg,  M.R. , Cluster  Analysis  for  Applications  , 
p.  191*  Academic  Press,  "Inc.’,  1973 


^Eisenbels,  R.A.,  and  Avery,  R.B. , Discriminant  Analysis 
and  Classification  Procedures,  p.  3*  Lexington  kooks,  1975 


The  procedure  constructs  a discriminant  function  based 
upon  input  data  in  which  subjects  are  members  of  known  groups 
This  discriminant  function  is  usually  linear  but  can  be  qua- 
dratic or  have  other  forms.  The  data  are  used  to  make  the 
function  specific  (determine  the  parameters).  Typically, 
it  is  then  used  to  reassign  the  original  subjects  to  one  of 
the  two  groups  on  the  basis  of  their  characteristics  in  order 
to  make  an  empirical  determination  of  the  rate  of  misclassi- 
fication.  If  all  subjects  are  reassigned  to  the  group  from 
which  they  initially  came,  then  there  is  zero  percentage 
mlsclassification  and  perfect  discrimination  between  groups. 
The  discriminant  function  can  also  be  used  to  categorize 
other  observations  (subjects),  whose  group  membership  is 
unknown,  on  the  basis  of  their  attributes. 

If  several  (more  than  two)  groups  are  present,  then  a 
3et  of  discriminant  functions  is  constructed  to  assign 
observations  to  the  appropriate  groups. 

A linear  discriminant  function  will  be  constructed  for 
the  two  pilot  groups  on  the  basis  of  their  experience  and 
proficiency  characteristics.  If  the  function  discriminates 
well,  then  one  can  determine  what  particular  characteristics 
have  the  strongest  influence  on  placing  a subject  in  the 
accident  group. ^ Also,  by  applying  the  discriminant  function 
to  subjects  not  in  the  original  test  groups  one  can  determine 
their  accident  potential. 


^Press,  S.J.,  Applied  Multivariate  Analysis,  p.  376-379, 
Holt,  Rinehart  and  Winston,  Inc.,  1972 
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The  assumptions  upon  which  discriminant  analysis  is 
based  and  the  actual  mathematics  will  be  covered  in  the 
nest  section. 

If  the  discriminant  function  fails  to  separate  the  groups 
without  a high  rate  of  misclassification,  the  lack  of  success 
can  be  attributed  to  one  of  two  causes.  The  first  is  that 
the  variables  characterizing  the  subjects  do  not  distinguish 
between  the  groups  to  a strong  enough  degree  or  the  groups 
overlap  too  much  in  the  given  measurement  space.  The  second 
is  that  the  groups  cannot  be  separated  by  a function  of  the 
form  chosen  for  the  analysis.  That  is,  maybe  instead  of  a 
linear  discriminant  function  we  should  have  a quadratic  or 
more  complex  one. 

To  illustrate  the  preceding  concept,  let  the  accident 
group  be  denoted  by  HAW  and  the  control  group  by  "C".  Now, 
if  one  considers  the  groups  in  two  dimensions  only  (instead 
of  the  actual  ten)  the  groups  might  be  clumped  as  in  Figure 

(1).  I 


Figure  (1) 


16 


In  this  case  a linear  discriminant  function  would  serve  to 
separate  the  groups  well  and  it  is  not  necessary  to  construct 
a quadratic  function.  If,  however,  the  data  ppeared  as  in 
Figure  (2),  then  one  can  see  that  a linear  discriminant 
function  cannot  discriminate  among  the  groups  without  error. 
However,  a quadratic  form  of  discriminant  function  such  as 
the  curve  depicted  might  very  well  have  excellent  discrimi- 
nating capabilities. 


The  linear  discx^iminant  function  is  a tool  that  is 
immediately  available  in  terms  of  computer  programs.  It 
is  based  upon  the  assumption  that  the  data  came  from  a 
multivariate  normal  population,  and  when  this  assumption  is 
met,  it  works  as  well  as  any  other  discriminant  function. 
Other  discriminant  functions  are  not  readily  available  for 
use.  Also,  the  linear  discriminant  function  could  do  a good 
job  even  if  the  multivariate  normal  assumption  is  not  met, 
i.e.  when  the  natural  separation  of  groups  is  so  great  that 
even  a simple  method  would  do  the  Job. 
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For  the  problem  at  hand,  the  use  of  the  linear  discrimi- 
nant function  was  encouraging,  but  since  the  assumption  of 
multivariate  normality  is  not  appropriate  (e.g.  rotation 
policies  split  variables  and  X2  so  that  their  distribu- 
tions are  multimodal)  it  was  decided  to  explore  the  nature 
of  the  data  to  see  if  a better  job  could  be  done. 

Exploratory  data  analysis  on  100  points  in  Euclidian 
10-space  is  not  easy.  Some  form  of  cluster  analysis  is 
called  for,  that  is,  cluster  the  subjects  into  groups. 

This  leads  to  the  question  of  how  many  groups  we  actually 
have  and  how  the  data  are  grouped. 

Cluster  analysis  is  actually  a collection  of  techniques 
that  are  used  to  group  multidimensional  entities  according 
to  various  criteria  of  their  degrees  of  homogeneity  or 
heterogeneity.^  For  example,  in  this  problem  grouping  will 
be  on  the  basis  of  the  values  of  each  variable  which  des- 
cribes the  pilot's  flight  experience  and  proficiency t Pilots 
with  high  total  flight  time  might  tend  to  cluster  into  one 
group  while  pilots  with  few  carrier  landings  or  with  little 
time  since  last  flight  might  tend  to  cluster*  Into  other 
groups.  Kow  close  should  the  values  of  the  variables  be 
before  subjects  are  grouped  into  the  same  cluster  is  the 
question  of  the  degree  of  homogeneity  desired,  and  how  many 
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18 


clusters  there  should  be  Is  the  question  of  the  degree  of 
heterogeneity  desired.  This  type  of  grouping  is  called 
grouping  by  subjects;  that  is,  the  entities  are  subjects. 

The  entities  can  also  be  the  variables  themselves,  in  which 
case  the  clustering  is  said  to  be  by  attributes. 

There  are  several  pertinent  questions  to  bear  in  mind 

when  performing  a cluster  analysis.  How  many  clusters  are 

inherent  in  the  data?  Since  attributes  may  be  measured  in 

different  units,  should  the  attributes  be  standardized 

before  they  are  clustered?  How  large  should  the  errors  be 

before  they  are  considered  intolerable?  There  will  be  one 

type  of  error  made  by  not  assigning  similar  entities  into 

the  same  group,  and  another  type  of  error  made  by  grouping 

dissimilar  entities  into  the  same  cluster.  Should  all 

possible  pairs  of  points  (or  attributes)  be  scrutinized  for 
£ 

similarities?  Not  all  of  these  questions  have  definite 
answers,  but  they  will  be  addressed  in  the  next  section. 

In  most  other  statistical  techniques,  such  as  analysis 
of  variance,  the  variables  usually  possess  some  structure 
of  belonging  to  particular  populations  a priori.  Consequently, 
it  is  often  possible  to  assume  particular  distributions  for 
the  populations  and  make  associated  inferences.  In  clustering 
problems,  however,  the  principal  concern  is  how  to  establish 
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appropriate  populations.  Thus,  clustering  analysis  logically 
precedes  the  application  of  most  other  multivariate  proce- 
dures when  the  data  do  not  possess  structured  form. 7 

There  are  two  possible  approaches  to  clustering.  These 

» 

are  enumerative  procedures  and  non-enumerative . Enumeratlve 
means  simply  to  list  all  the  possible  groupings  of  subjects 
(attributes  if  the  clustering  is  by  this  form  of  entities). 
The  number  of  possible  groupings  is  represented  by  a Stirling 
Number  of  the  Second  Kind.  For  example,  in  clustering 
twenty-five  subjects  into  five  groups  there  are  between  two 
and  three  quadrillion  possibilities  from  which  to  choose 

O 

the  best  grouping.  This  is  not  feasible  even  with  «.  "c  .\ 
ter,  especially  when  the  problem  is  much  larger  than  w w. 
Some  feasible  non-enumerative  techniques  are  described  in 
the  next  section. 

If  through  the  use  of  cluster  analysis  one  can  find  a 
feasible  set  of  groupings  that  have  meaning  to  this  problem 
then  the  groupings  can  be  analysed  by  a discriminant  analysis 
to  obtain  the  desired  classification  procedure. 

Clustering  by  variables  can  also  prove  to  be  worthwhile 
in  that  it  can  help  to  determine  if  some  of  the  variables 
are  redundant  and  not  providing  any  additional  information. 


7Ibid. 
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If  so,  those  redundant  variables  can  be  eliminated  or  com- 
bined, thus  simplifying  the  required  computations.  A 
method  of  combining  variables  which  was  utilized  was  that 
of  principal  components  analysis. 

Multivariate  analysis  by  the  principal  components 
model  attempts  to  reduce  the  dimension  of  the  problem  while 
retaining  as  much  information  (i.e.  variation)  contained  in 
the  original  data  as  possible.  The  method  produces  linear 
combinations  of  the  original  variables  which  maximize  the 
variance  of  the  resultant  weighted  sum.  Thus  attention  is 
centered  primarily  on  the  variable  with  the  greater  varia- 
bility by  the  appropriate  assignment  of  the  weights.  This 
i 

1 linear  combination  of  the  variables  is  called  the  first 

principal  component  and  reduces  our  set  of  old  variables  to 

one  variable.  If  it  is  desired  to  extract  more  variance 

from  the  data,  one  can  construct  a second  principal  component 

which  is  orthogonal  to  the  first.  The  process  can  be  repeated 

until  there  are  as  many  components  as  original  variables j 

and  thus  have  extracted  one-hundred  percent  of  the  total 
q 

variance , 

The  objective  of  principal  components  analysis  is  not 
merely  to  reduce  the  size  and  complexity  of  the  problem, 
but  also  to  glean  information  from  the  data  which  might  not 
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otherwise  be  obvious.  Specifically,  in  the  problem  under 
study  here,  the  fifth,  sixth,  seventh  and  eighth  variables 
listed  on  page  eleven  can  be  regarded  as  prime  indicators 
of  frequency  of  flying.  However,  when  the  data  is  analyzed 
(by  cluster  analysis)  the  exact  effects  of  these  variables 
might  not  readily  be  apparent , When  all  these  variables 
are  combined  into  one  variable  (l.e.  the  first  principal 
component)  the  effect  of  frequency  might  be  quite  obvious. 
That  is,  it  might  be  observed  that  frequency  of  flying  has 
an  inverse  relationship  with  the  occurrence  of  accidents. 

For  this  analysis,  of  those  variables  listed  in  page 
eleven,  the  first  and  second  (years  designated  naval  aviator 
and  total  hours  flown)  were  combined  to  get  a "total 
experience"  variable;  and  the  fifth,  sixth,  seventh  an** 
eighth  (time  flown  in  the  last  24  hours,  time  flown  in  the 
last  48  hours,  number  missions  flown  in  the  last  24  hours, 
and  number  missions  flown  in  the  last  48  hours)  were  combined 
to  get  a "frequency"  variable. 
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V.  ANALYTICAL  TECHNIQUES 


The  first  analytic  technique  applied  to  the  two  groups 
of  pilots  was  that  of  discriminant  analysis  with  the  primary 
objective  being  to  develop  an  accurate  linear  discriminant 
function,  (Actually,  the  purposes  of  discriminant  analysis 
are  first  to  determine  if  there  is  a difference  among  popu- 
lation means  or  equivalently  if  there  are  any  overlaps  among 
the  groups,  and  secondly,  to  construct  classification  schemes 
based  upon  the  descriptive  variables.) 

There  are  three  basic  underlying  assumptions  of  dis- 
criminant analysis.  They  are  (1)  that  the  groups  being  inves- 
tigated are  discrete  and  identifiable,  (2)  that  each  observa- 
tion (subject)  in  each  group  can  be  described  by  a set  of 
measurements  on  m characteristics  or  variables,  and  (3)  that 
these  m variables  are  assumed  to  have  a multivariate  normal 
distribution  in  each  population  and  equal  covariance  matrices 
among  populations.  The  first  two  assumptions  are  seen  to  be 
satisfied  as  discussed  In  previous  sections.  The  third  assump- 
tion indicates  the  need  for  separate  statistical  tests  to 
determine  if  the  variables  arc  multivariate  normal  and  if  the 
covariance  matrices  are  equal.  It  has  been  mentioned  that 
non-normal  multivariate  data  does  not  necessarily  bias  the 
results  of  a discriminant  analysis.  Also,  since  no  satis- 
factory tests  exist  for  testing  populations  to  be  multivariate 


normal,  it  is  difficult  to  routinely  test  the  normality 
assumption.  Finally,  the  central  limit  theorem  suggests 
that  as  the  number  of  observations  increases,  the  discri- 
minant values  for  each  group  approaches  a normal  distribu- 
tion.10 

The  assumption  of  equality  of  covariance  matrices 
(i.e.  equality  of  within  group  dispersions)  appears  to  be 
more  critical  in  biasing  the  results.  Eisenbeis  and  Avery 
suggest  that  linear  classification  rules  are  not  adequate 
when  unequal  covariance  matrices  exist  and  that  quadratic 
classification  rules  should  be  employed.11 

The  within  group  dispersion  matrices  for  the  two  groups 
of  data  were  computed  and  are  shown  in  Table  VI  In  Appendix 
C.  The  pooled  within-groups  dispersion  matrix  is  also 
shown.  The  group  dispersion  matrices  were  tested  for  equality 
by  the  procedure  given  In  Appendix  D, 

After  satisfying  the  assumptions  preparatory  to  the 
actual  analysis  one  can  first  test  the  equality  of  group 
swans.  The  null  hypothesis  is; 


!i0:  * U2 
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where 


"l  ° ^1,1*  ^1,2*  •••*  *ri,10) 


and 


**2  * (y2,l*  **2,2»  •••»  v2,105* 


The  following  steps  are  used  by  the  BIMED04M  computer 
program  to  test  for  the  equality  of  group  means: 

Step  (1)  — the  means  for  each  group  are  computed 


Xj  = (x±  x±  2 j •••»  x.^  ^g)  , i — 1,2 


Step  (2)  — the  differences  in  group  means  are  computed. 


X,  **  X»j  a (Xt  t ~ x5  if  ■ . . j x^  ” ^2,10^ 


1 2 'Al,l  2,1; 


12 

Step  (3)  — the  matrices  S and  S are  computed  where  an 


element  of  S is  given  by 


tij 


s1  33  l (x,,„  - X,  „)(x.  , - X,  „)  and 

u,v  jBl  iju  i,u  ijv  i,v 


i = 1,2 j u “ 1,2,..,,  10 j and  v « 1,2,.., 
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Step  (4)  — the  matrix  A is  computed 


A * S1  + S2 


where  (a^1,  a^2,  ...,  a^’10)  is  the  j~  row  of  A 


Step  (5)  — the  Mahalanobis  D statistic  is  computed 


m m 


- (n1  + n2-2)  ^ Ea1',<51>1  - - *2>J> 


Step  (6)  — tho  P statistic  is  computed 


nln2(nl  - »2  - ■»  - 1)  „2  . ^ , 

- n2'Un-''  -'n2"-T)  ' D ^ - n2  - m - 1 


where  n,  and  n«  are  the  respective  sizes  of  the  two 
Id  2,2 

groups  and  ra  is  the  number  of  variables. 


The  null  hypothesis  can  be  rejected  when  the  value  of  the 
test  statistic  is  greater  than  the  tabled  value  of  P for 
the  desired  level  of  significance. 

The  construction  of  the  discriminant  function  is  predi- 
cated upon  minimizing  the  effects  of  misclassification  and 
assigning  subjects  to  the  group  to  which  they  have  the 
greatest  resemblance.  The  effects  of  misclassification 
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depend  upon  the  a priori  knowledge  of  group  membership  and 
the  costs  or  penalties  of  misclassification.  The  BIMED 
programs  assume  no  special  a priori  probabilities  of  group 
membership,  i.e.  the  probability  of  belonging  to  either 
group  (in  the  two-group  case)  is  one-half.  They  also  assume 
the  costs  of  misclassification  to  be  equal,  i.e.  the  cost 
of  assigning  an  actual  member  of  group  number  one  to  group 
number  two  is  the  same  as  assigning  a member  of  group  number 
two  to  group  number  one. 

The  measure  of  resemblance  is  determined  by  the  m char- 
acteristics which  describe  each  subject.  By  substituting 
the  values  of  the  characteristics  into  each  group’s  proba- 
bility density  function  it  is  determined  how  closely  the 
subject  resembles  the  group  as  compared  with  the  rest  of  the 
population.  The  BIMED  programs  yield  the  coefficients  and 
constants  for  the  linear  discriminant  function  for  each 
group  in  the  total  population,  J 

In  order  to  determine  what  effect  the  chosen  variables 
had  on  proficiency  and  experience  it  was  desirable  to  mea- 
sure the  association  among  the  variables.  The  association 
measure  employed  was  the  product-moment  correlation  coeffi- 
cient. The  correlation  computations  and  correlation  matrix 
for  the  entire  data  set  is  given  in  Appendix  P. 
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Ibid. 


The  problem  of  how  to  group  the  variables  given  this 
association  measure  can  be  solved  through  the  use  of  hier- 
archical clustering  techniques.  These  techniques  can  also 
be  useful  to  cluster  by  data  units  (subjects)  which  have  a 
different  association  measure. 

For  the  association  measure  among  data  units,  most 
investigators  use  metric  measures  when  the  data  units  are 
described  by  interval  variables.  Metric  measures  must 
satisfy  certain  properties.  If  E is  a given  measurement 
space  and  X,  Y,  and  Z are  points  in  E,  then  an  association 
function  D is  a metric  measure  if  and  only  if  it  satisfies 

if  and  only  if  X = Y 
for  all  X and  Y in  E 
for  all  X and  Y in  E 

(4)  D(X,Y)  < D(X,Z)  + D(Y,Z)  for  all  X,  Y and  Z in  E 


the  following  conditions: 

(1)  D(X,Y)  = 0 

(2)  D(X,Y)  > 0 

(3)  D(X,Y)  = D(Y,X) 


The  most  common  metric  measure  is  the  Euclidian  distance 


n 


2,k 


function,  D2(Xj,  X^)  ® [ T.  (x^j  - xiI{)  3*.  This  is  a special 
case  of  the  general  class  of  metrics  called  Minkowski  metrics 
which  have  the  form  Dp ( X j , X^)  = [ E | xi  j - xijc  1 ^ , 

rp  i®  1 

where  p > 1 and  Xj  e (x^  x^j  , ...»  ) Is  the  vector 
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of  scores  on  the  j~  data  unit.  In  this  analysis  the 
Euclidian  distance  function  was  used  to  cluster  the  data 
units. ^ 

The  hierarchical  methods  are  used  to  construct  a tree 
(dendrogram)  depicting  the  relationship  among  the  entities. 

The  entities  are  grouped  into  clusters  in  order  of  their 
association  measures  or  similarities.  The  ordering  provides 
a hierarchy,  thus  the  name.  The  similarities  can  be  of  many 
forms  of  association  measures;  the  general  term  applied  to 
the  matrix  being  a similarity  matrix. 

A breakdown  of  hierarchical  methods  yields  agglomerative 
and  non-agglomerative  procedures.  The  agglomerative  proce- 
dures start  with  the  branches  (each  entity)  and  combine  these 
entities  until  there  is  but  one  remaining  cluster  (the  root). 
The  alternative  procedures  work  from  the  root  backward. 

Only  the  former  was  used  in  this  analysis. 

There  are  many  actual  techniques  and  criteria  of  hier- 
archical clustering.  Initially  each  entity  is  considered  to 
be  a cluster  of  one.  The  first  method  searches  the  similarity 
matrix  for  the  pair  of  entities  with  the  highest  degree  of 
association  (e.g.  largest  correlation  among  the  variables) 
and  groups  these  two  entities.  It  then  searches  all  remain- 
ing clusters  and  groups  those  two  clusters  which  are  closest, 
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i.e.  the  correlation  among  their  closest  members  is  highest. 
This  step  is  repeated  until  there  is  but  one  cluster  remain- 
ing. This  method  is  called  the  "single  linkage"  method  by 
Anderberg  or  the  "connectedness"  method  by  Johnson. The 
names  derive  from  the  fact  that  each  cluster  is  Joined  by 
the  single  shortest  or  strongest  link  (thus  most  strongly 
connected)  between  them. 

The  second  procedure,  called  complete  linkage,  is  the 
same  as  single  linkage  except  that  the  association  between 
groups  is  the  association  between  their  farthest  members. 
Johnson  calls  this  the  diameter  method  because  all  entities 
in  a cluster  are  linked  to  each  other  at  some  maximum  distance 
(or  diameter). 

Hierarchical  clustering  is  usually  not  too  enlightening 
for  the  clustering  of  data  units.  The  non-hierarchical 
methods  are  more  appropriate  for  classifying  the  data  units 
into  a single  classification  of  k clusters.  The  basic  con- 
cept in  most  of  the  non-hierarchical  methods  is  to  begin 
with  an  initial  partition  of  the  data  units  and  adjust  the 
cluster  members  to  obtain  a "best"  partition. 

The  simplest  and  most  common  non-hierarchical  clustering 
procedure  is  that  of  centroid  sorting.  Beginning  with  the 
initial  partition  of  k clusters  (each  usually  consisting  of 
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one  data  unit)  a new  data  unit  Is  assigned  to  the  cluster 
with  the  nearest  centroid  by  some  sort  of  distance  measure. 
Centroids  are  recomputed  after  a data  unit  is  assigned  and 
the  procedure  repeated  for  remaining  data  units.  After  all 
data  units  are  assigned,  the  entire  procedure  can  be  reapplied 
to  all  data  units  over  and  over  until  there  are  no  more 

17 

changes  in  cluster  memberships,  i.e.  until  convergence. 

There  are  more  complex  methods  than  the  centroid  methods 
for  clustering  data  units  and  these  are  based  on  multivariate 
statistical  analysis  techniques.  The  scatter  of  two  variables 
is  the  inner  product  of  two  centered  score  vectors.  The 
scatter  matrix  T is  a square  matrix  that  has  the  entry  t^j 
which  is  the  scatter  of  variables  i and  j computed  over  all 
the  data  units.  Each  of  the  h clusters  has  its  own  scatter 

t*  h 

matrix  W.  computed  over  the  data  units  in  the  k~^  cluster, 

K h 

The  within  groups  scatter  matrix  is  given  by  W » t W.  . 

k~l  K 

The  between  groups  scatter*  matrix  is  denoted  by  B.  An 
h 

element  b^j  » l nik*ikxjk  where  ^ data 


units  in  the  cluster,  xik  is  the  mean  (centered  around 

th 

the  grand  mean  in  the  entire  data  set)  of  the  i—  variable 
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in  the  k™  cluster.  The  three  scatter  matrices  can  be 
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shown  to  satisfy  the  relation  T = B + W. 

An  important  element  in  many  clustering  criteria  is 
the  deterrainantal  equation  |B  - XW|  *»  0.  The  eigenvectors 
of  the  matrix  Vf^B  provides  the  X^^  solutions  to  this  equation. 

D.  J.  McCrae  has  developed  a FORTRAN  IV  computer  program 
called  K-MEANS  which  utilizes  these  concepts  to  cluster  the 
data  into  k clusters.  He  provides  for  four  possible  criteria 
for  determining  when  assignment  of  a data  unit  to  a particu- 
lar cluster  results  in  the  '•best1'  partition  of  the  data  set. 

These  criteria  are:  (1)  minimize  the  trace  of  Wj  (2)  maximize 

the  largest  eigenvalue  of  VT^B;  (3)  maximize  the  trace  of 
VT^B:  and  (4)  minimize  the  ratio  of  the  determinants  |Wj/|T|. 

This  last  criterion  is  more  commonly  known  as  Wilk's  Lambda 
statistic.  Since  T is  the  same  for  all  partitions,  this  is 
equivalent  to  minimizing  det  W.  The  last  procedure  was  the 
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one  used  to  cluster  the  data  units  In  this  particular  analysis, 
KcCrae’s  K-MEANS  also  allows  three  choices  of  diatnce 
measures  between  clusters.  These  are  Euclidian  distance, 
scaled  Euclidian  distance,  and  Kahalanobis  distance.  Assum- 
ing normal  populations,  N(6^,  Ej)»  with  equal  covariance 
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matrices  * Z2  * ...  * £ so  that  the  populations  differ 

only  in  location,  the  Mahalanobis  distance  between  the 

populations  is  given  by  D2  * (0^  - 0j)T  - 0j).  This 
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was  the  distance  measure  used  in  this  cluster  analysis. 

The  question  of  how  many  clusters  are  present  in  the 
data  was  mentioned  in  the  previous  section.  It  can  be  shown 
that  one  prime  indicator  of  the  discriminability  of  variables 
in  the  data  set  is  given  by  the  log  of  the  ratio  det  T/det  W. 
When  this  quantity  is  plotted  against  the  number  of  clusters 
one  can  gain  insight  as  to  the  appropriate  number  of  clusters 
within  the  data  set.  As  the  number  of  clusters  is  increased 
the  ratio  begins  to  reach  a stabilizing  value  indicating 
that  the  discriminability  of  the  data  is  decreasing.  Thus, 
one  can  approximate  the  maximum  number  of  natural  clusters 
by  observing  when  the  curve  levels  off.  It  should  be 
reemphasized  that  It  is  a primary  objective  of  most  cluster 
analysis  problems  to  produce  a set  of  clusters  that  are  well 
differentiated  from  each  other. 

As  stated  before,  when  cluster  analyses  are  performed 
on  data  with  several  variables  actually  measuring  the  same 
characteristic,  it  might  be  profitable  to  reduce  the  problem 
to  one  of  only  a few  primary  variables  by  the  techniques  of 
principal  components  analysis. 


200p.  Cit,,  Press,  S.J.  p.  372-323 
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Por  this  analysis,  the  computer  program  BIMED01M  was 

utilized  to  extract  the  first  principal  component  from  the 

first  two  variables  and  the  first  principal  component  from 

the  fifth,  sixth,  seventh  and  eighth  variables.  BIMEDOIM 

performs  the  following  four  basic  steps:  (1)  the  data  are 

normed  and  centered]  (2)  the  correlation  matrix  of  the 

centered  and  normed  data  is  computed;  (3)  the  eigenvalues 

and  corresponding  eigenvectors  of  the  correlation  matrix 

are  calculated;  and  (4)  the  centered  and  normed  data  are 

21 

transformed  into  their  orthogonal  components. 


21 

BMP  Manual,  Biomedical  Computer  Programs,  Health 
Sciences'  Cojrpu t ing  :?ac ility , "UCLA "Uni vers i ty  of  California 
Press,  1973,  P.  193-201 
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VI.  RESULTS  OP  ANALYSIS 


The  two  data  groups,  control  and  accident,  were  first 
investigated  by  discriminant  analysis  with  the  use  of  the 
computer  program  BIMED04M. 

The  test  for  equality  of  group  covariance  matrices  (or 
equivalently,  group  dispersion  matrices)  was  performed 
according  to  the  procedure  developed  by  G.  E.  P.  Box  and 
illustrated  in  Appendix  D.  They  were  found  to  be  equal  at- 
the  .10  level  of  significance  so  it  was  appropriate  to  apply 
the  discriminant  analysis  procedures. 

Testing  for  the  equality  of  group  means,  BIMED04M 
computed  an  P statistic  of  8.94.  For  the  a » .001  level  of 
significance,  the  tabled  F value  is  Fni+ns-m-l^1  “ n 
*50+50-10-1^  ~ **  Egg (•  999)  ~ 3.39  and  one  can  conclude 

that  there  is  definitely  a difference  iu  location  of  group 
means. 

The  computed  discriminant  function  coefficients  were 
(-0.00152,  0.00001,  -0.00035,  0.00360,  -0.00988,  0.00685, 
0.00218,  -0.00191,  -0.00245,  0.00231).  If  after  applying 
the  coefficients  to  a data  unit  vector  Xj , 

-0,00l5lXj^  +■  0.00001Xj2  + ...  * 0,C023lXj  j^q  £ 9 then 
data  unit  j is  assigned  to  group  number  two.  Otherwise, 
the  data  unit  is  assigned  to  group  number  one. 

Those  subjects  who  had  high  values  for  the  variables 
with  positive  coefficients  and  low  values  for  the  variables 
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with  negative  coefficients  were  classified  as  being  in  the 
control  group,  and  those  with  opposite  attributes  were 
classified  as  belonging  to  the  accident  group. 

The  discriminant  function  was  applied  to  the  original 
data  units  to  determine  the  performance  of  the  function. 
Fifteen  subjects  of  the  fifty  in  the  control  group  were 
classified  as  being  in  the  accident  group,  while  only  four 
of  the  fifty  in  the  accident  group  were  classified  as  being 
in  the  control  group.  It  is  important  to  observe  that 
although  the  overall  misclassification  rate  is  nineteen 
percent,  the  misclassification  rate  of  the  original  accident 
group  is  only  eight  percent.  This  is  encouraging.  The 
question  of  identifying  correctly  those  in  the  accident 
group  is  of  greater  concern  than  that  of  misclassifying 
those  individuals  in  the  control  group. 

To  obtain  the  preceding  results,  it  should  be  noted 

that  the  discriminant  analysis  was  performed  on  the  raw  data 

as  listed  in  Appendix  A.  An  analysis  was  also  performed  on 

the  standardized  data,  listed  in  Appendix  B,  but  the  results 

were  much  poorer.  Using  standardised  data,  the  overall 

misclassification  rate  was  fifty-five  percent,  quite  a loss 

of  discriminating  power.  It  should  be  recognised  that 

standardising  data  has  the  drawback  of  providing  answers  to 

22 

a problem  different  than  the  one  originally  posed. 
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In  addition  to  learning  the  misclassification  rates,  it 
was  also  desired  to  determine  which  variables  had  the 
strongest  effect  on  classifying  the  data  units  and  in  which 
direction  the  effect  was  observed.  The  discriminant  function 
coefficients  indicate  whether  each  variable  has  a positive 
or  negative  effect,  but  because  of  the  difference  In  magni- 
tudes of  the  variables,  the  discriminant  function  coefficients 
alone  do  not  tell  how  much  of  an  effect.  It  is  of  interest, 
therefore,  to  compare  how  much  a one  standard  deviation 
change  in  each  variable  will  affect  the  discriminant  function. 
Table  I presents  the  standard  deviations  of  each  variable  in 
the  second  column,  the  discriminant  function  coefficients  in 
the  third  column,  and  in  the  last  column  the  effect  on  the 
discriminant  function  of  a one-sigma  change  in  each  variable. 

TABLE  I 


Variable 

Standard 

Deviation 

Disc.  Funct. 
Coefficient 

Effect  of  a 
1c  Change 

1 

6.03 

-0.0015? 

-0.00916 

2 

1390.00 

0.00001 

0.01390 

3 

28.20 

-0,00025 

-0.00987 

4 

2-99 

0.00360 

0.01080 

5 

1.42 

-0.00988 

-0.01400 

6 

1.78 

O.OC685 

0.01219 

T 

0.68 

0,00218 

0.00146 

8 

1.03 

-0.00191 

-0.00248 

9 

5.14 

-0.00245 

-0.01259 

10 

2.45 

0.00231 

0.00565 

The  results  of  Table  I indicate  that  variable  two  (total 
hours)  has  the  strongest  positive  effect  in  classifying  a 
subject  as  not  being  in  the  accident  group.  A surprising 
result,  however,  is  that  variable  five  (time  flown  in  the 
last  twenty-four  hours)  has  the  strongest  negative  effect 
while  variable  six  (time  flown  in  the  last  forty-eight  hours) 
has  a strong  positive  effect.  This  would  suggest  that  flying 
every  other  day  is  beneficial,  but  that  too  much  flying  (i.e. 
everyday)  is  detrimental.  Similar  interpretations  can  be 
made  for  the  remaining  variables  although  their  effects  are 
less  pronounced. 

Although  an  overall  misclassification  rate  of  nineteen 
percent  tends  to  indicate  that  there  are  meaningful  differ- 
ences between  the  two  groups,  the  classification  capabilities 
of  the  discriminant  function  are  not  as  sharp  as  one  would 
like.  One  cannot  say  with  assurance  how  a pilot  not 
initially  a member  of  either  group  should  be  classified. 

It  was  desired  to  learn  more  about  the  variables’  effects 
to  tc  able  to  apply  conclusions  to  subjects  beyond  the  range 
of  the  data.  To  do  this,  a second  method  of  analysis  was 
employed;  that  of  cluster  analysis. 

The  first  type  of  cluster  analysis  used  was  hierarchical 
clustering  by  data  units.  The  computer  program  HI-ChUST  was 
used  with  Euclidean  distance  measure  between  data  units  as 
the  indicator  of  association.  The  results  fro©  both  the 
single  linkage  and  complete  linkage  methods  were  not  at 
all  satisfactory.  When  clustered  into  the  final  two  groups. 
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one  cluster  consisted  of  ninety-nine  units  and  the  other  of 
a single  unit.  The  cluster  of  ninety-nine  units  was  composed 
of  clusters  of  ninety-three  units  and  six  units ; again 
shedding  no  light  on  relation  to  accidents.  Therefore,  the 
clustering  on  data  units  was  reworked  using  the  non- 
hierarchicai  techniques  of  the  computer  program  K-MEANS. 

Initially,  the  one  hundred  data  units  were  clustered 
into  two  groups  to  ascertain  if  there  was  any  association 
directly  with  the  two  original  groups,  control  and  accident. 
Unfortunately,  there  did  not  appear  to  be  any  association, 
as  cluster  number  one  contained  thirty-three  subjects  from 
the  control  group  and  thirty-seven  from  the  accident  group 
while  cluster  number  two  had  seventeen  and  thirteen, 
respectively. 

Figure  (3)  graphically  depicts  the  cluster  means  of  the 
two-group  cluster  results,  and  the  number  of  subjects  in  the 
clusters.  It  is  interesting  to  note  that  fifty-three  percent 
of  cluster  number  one  was  composed  of  subjects  from  the 
accident  group  while  only  forty-three  percent  of  cluster 
number  two  was  from  the  accident  group.  By  inspecting  the 
cluster  means  of  variables  one  and  two,  one  can  see  that  the 
cluster  compositions  are  inversely  related  to  total  experience, 
i.e.  cluster  number  one  has  higher  accident  composition  and 
fewer  years  designated  naval  avaiatcr  and  fewer  total  hours. 

The  same  kind  of  relation  is  s<=>en  to  apply  to  the  recency  and 
frequency  variables  (variables  four  through  ten)  but  the 
separation  is  not  as  great.  Cluster  number  one  which  has  the 
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higher  accident  composition  has  cluster  means  which  indicate 
more  recent  and  frequent  flying  than  the  subjects  of  cluster 
number  two.  Again,  the  results  here  are  in  basic  agreement 
with  those  of  discriminant  analysis  in  that  they  indicate 
less  frequent  flying  is  beneficial.  But  of  course,  the 
support  is  very  thin  and  the  results  are  far  from  conclusive, 
especially  since  the  cluster  means  are  seen  to  be  relatively 
close  for  all  of  variables  four  through  ten. 

It  was  stated  in  Section  V that  it  is  possible  to  get 
a rough  idea  of  the  number  of  natural  clusters  present  in 
the  data  by  plotting  log(det  T / det  W)  versus  the  number  of 
clusters.  Figure  (4)  is  a plot  of  this  information  for  the 
data  under  study.  As  the  number  ox'  groups  is  increased  the 
curve  begins  to  level  off.  It  appears  that  beyond  nine  groups 
there  is  not  much  additional  Information  to  be  gained  by 
grouping  further. 

The  primary  interest  lies  in  the  analysis  of  two  groups, 
since  there  were  two  groups  initially,  and  in  the  analysis 
of  the  natural  number  of  groups.  Between  two  and  nine  groups 
the  results  are  believed  to  be  less  useful. 

Figure  (5)  graphically  portrays  the  cluster  means  of  the 
nine  c3.uster  results,  and  the  number  of  data  units  in  each 
cluster.  The  relationships  among  clusters  here  are  not 
apparent  and  there  is  no  one-to-one  correspondence  such  as 
an  inverse  relation  between  the  cluster  means  of  total  hours 
flown  and  composition  of  clusters  by  accident  percentages. 
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log  (det  T / det  W) 


Figure  (4) 

It  Is  desirable,  therefore,  to  plot  the  proportion  of  each 
cluster  from  the  accident  group  versus  the  cluster  means 
for  each  variable.  By  so  doing,  trends  might  appear  and 
factors  influencing  the  accident  proportions  might  become 
more  readily  observable.  These  plots  are  depicted  in  Appendix 
E as  Figures  (10)  through  (19).  Figures  (10)  through  (19) 
are  similar  in  that  none  of  them  reveal  any  prominent 
relationships  that  their  respective  variables  have  with  the 
proportion  of  the  clusters  composed  of  accident  subjects, 
i Intuitively,,  one  might  have  hypothesised  that  as  the  cluster 

means  increased  (as  in  Fig.  (11)  for  instance)  that  the 
proportion  of  the  clusters  composed  of  accident  units  would 
decrease.  Since  this  kind  of  relationship  did  not  appear 
for  the  total  hours  variable,  nor  did  similarly  anticipated 
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relations  hold  for  the  other  variables,  a final  type  of 

* 

analysis  was  performed  on  the  data. 

It  seemed  plausible  that  although  the  variables  individ- 
ually did  not  reflect  the  contributions  they  had  upon 
accidents,  certain  variables  collectively  might  demonstrate 
such  an  effect.  To  determine  which  variables-  to  combine,  a 
hierarchical  clustering  analysis  was  performed  by  the  computer 
program  HI-CLUST.  Product-moment  correlation  was  used  as 
the  association  measure  between  variables.  Both  the  methods 
of  single  linkage  and  complete  linkage  clustering  as  discussed 
in  Section  V were  employed.  The  results  are  shown  as 
hierarchical  trees  (dendrograms)  in  Figures  (6)  and  (7). 

The  results  of  both  hierarchical  methods  are  similar. 
Variables  number  one  and  two  are  highly  correlated  and 
variables  five,  six,  seven  and  eight  are  highly  correlated. 
Therefore,  it  was  decided  to  combine  those  respective  variables, 
calling  the  first  the  experience  variable  and  the  second  the 
frequency  variable.  In  order  to  eliminate  all  unnecessary  or 
distracting  influences  it  was  also  considered  prudent  to 
eliminate  variables  nine  and  ten  since  very  few  accidents 
involved  carrier  landings  and  many  subjects  in  both  groups 
were  not  involved  in  carrier  operations  during  the  period 
investigated. 

As  discussed  in  Sections  IV  and  V,  BIKED01M  was  used  to 
extract  the  first  principal  components  from  those  combina- 
tions of  variables  listed  above  to  obtain  the  total 
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experience  variable  and  the  frequency  variable.  The  principal 
components  (exhibited  in  Appendix  G)  were  extracted  from  the 
standardized  data  (exhibited  in  Appendix  B)  as  required  by 
BIMED01M. 

With  the  data  reduced  to  four  main  variables,  the  cluster 
analysis  program  K-MEANS  was  again  used  to  investigate  the 
data.  As  was  done  with  the  data  in  ten  variables  (or 
regular  space)  the  data  was  first  investigated  by  clustering 
into  just  two  groups.  The  cluster  means  are  depicted  in 
Figure  (8).  As  was  true  in  the  regular  space  analysis,  there 
does  not  appear  to  be  any  association  between  clustering  of 
data  units  and  membership  in  the  accident  group.  Again, 
there  were  thirty-three  subjects  from  the  control  group  and 
thirty-seven  from  the  accident  group  in  cluster  number  one, 
and  seventeen  and  thirteen  respectively  in  cluster  number 
two.  Thus,  fifty-three  percent  of  cluster  number  one  was 
from  the  accident  group  and  forty-three  percent  of  cluster 
number  two  was  from  the  accident  group. 

The  graph  of  log  (det  T / det  W)  versus  number  of 
clusters  was  plotted  for  the  reduced  space  analysis  in 
Figure  (9)  and  was  also  found  to  indicate  that  beyond  nine 
clusters,  minimal  information  is  gained.  Therefore,  a plot 
of  the  proportion  of  clusters  from  the  accident  group  versus 
the  cluster  means  was  constructed  for  each  of  the  four 
variables  in  the  reduced  space  with  nine  cluster  groupings. 
Pigures  (20)  through  (23)  in  Appendix  K are  graphs  of  the 
results. 
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The  resultant  plots  in  Appendix  H are  not  too  informative. 
Three  of  the  four  "new**  variables  do  not  appear  to  reveal  any 
structure j but  the  first,  the  experience  variable,  may  have 
some  interest.  A parabolic  fit  has  been  drawn  in  freehand 
and  the  accident  rate  seems  to  bottom  out  for  experience  in 
the  interval  (-1,0).  This  is  misleading  however.  The 
interval  (-1,0)  of  the  experience  variable  corresponds  to 
values  of  and  X2  which  arc  between  modes  of  their  respec- 
tive distributions.  Only  seven  of  the  one-hundred  aviators 
are  in  this  range. 


VII.  CONCLUSIONS 


The  three  analytical  methodologies  employed  in  this 
investigation  were  primarily  utilized  as  exploratory  tools 
to  determine  if  there  were  significant  differences  in  the 
various  flight  time  statistics  recorded  for  sample  groups 
of  pilots  with  and  without  accidents.  The  discriminant 
analysis  techniques  provided  the  best  indication  that  there 
were  differences  which  could  be  used  to  categorize  the 
pilots  according  to  the  probability  of  belonging  to  the 
accident  group. 

It  should  be  recognized  that  failure  to  distinguish 
among  pilots  according  to  their  flight  statistic  attributes 
is  not  necessarily  a fault  of  the  analytical  procedures,  but 
inherent  Inability  of  the  data  as  currently  conceived  to 
discriminate  among  subjects.  This  does  not  suggest,  however, 
that  this  approach  to  accident  analysis  has  no  merit.  Xt 
does  point  out  the  need  to  expand  the  investigation  to  in- 
clude more  quantitative  aspects  of  flying.  Many  other 
variables  such  as  instrument  time,  synthetic  trainer  time, 
number  of  instrument  approaches,  average  time  spent  briefing 
flights,  and  subjective  attributes  such  as  training  command 
flight  grades  and  NATOPS  quiz  grades  could  be  included. 
Breaking  the  investigation  down  into  many  more  restrictive 
areas  such  as  including  only  accidents  in  a particular  phase 
of  flight,  or  Including  only  accidents  by  a particular  type 
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of  aircraft  such  as  attack  or  patrol  might  also  prove  to 
be  more  relevant.  It  should  be  informative  to  expand  the 
time  span  of  the  data  base  to  include  five  or  ten  years  so 
as  to  have  a larger  sample  size  on  which  to  base  results. 
Also,  enlarging  the  size  of  the  control  group,  would  help 
to  eliminate  the  effects  of  non-randomness  which  could  bias 
the  data. 

Despite  the  fact  that  the  data  investigated  in  this 
analysis  did  not  contain  those  characteristics  which  could 
identify  the  underlying  accident  generating  mechanism,  it 
is  still  considered  worthwhile  to  pursue  the  basic  ideas 
developed  here  in  future  accident  analysis. 
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TABLE  II  (Continued) 


3 

a 

5 

6 

n 

3 

9 

10 

2673 

76 

02 

2.? 

4.0 

1.3 

1.9 

10 

03 

1199 

14 

15 

0,0 

C.O 

0.1 

0.1 

00 

00 

1143 

38 

05 

0.3 

0.4 

0.1 

0.1 

00 

00 

0703 

10 

10 

0.5 

0.9 

o.3 

0.5 

00 

00 

1000 

46 

05 

0.5 

0.8 

0.7 

0.9 

00 

00 

1769 

61 

02 

1.2 

1.9 

0.6 

0.9 

02 

03 

4172 

31 

04 

1.5 

3.1 

0.9 

1.4 

00 

00 

0528 

13 

13 

1.2 

2.3 

0.6 

1.0 

00 

00 

4843 

82 

02 

1.6 

2.9 

0.8 

1.1 

06 

03 

5123 

23 

05 

0.8 

1.6 

0.6 

0.9 

01 

01 

m.  ■* 
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TABLE  III  - Ac 


Variables 


1 0405 

6 2232 
5 1760 

3 0918 
1 0421 

1 0150 
12  2860 
3 1121 
22  3W 
7 2760 
5 1724 

5 0654 

7 1462 

1 0405 

12  3204 

0919 
1142 

0357 
1085 
0679 
4001 
1144 
9 2658 
3 1153 

6 1762 
0457 
1241 
1129 
3933 

18  4041 

5 1640 

12  3285 


18  4461 

5 1595 

0 0329 

3 0577 


5 0866 
1 2 246? 


12  3'?72 

3 0957 


v * ** 

5 1457 

18  4727 

5 1500 

2 0^94 

5 1484 

18  2939 

1 0513 


0 

0 

3 

1 

0 

1 

0 

0 

3 

1 

Q 

0 

1 

1 

o?Li 


APPENDIX  B 


TABLE  IV  - Control  Group  Standardized  Data 


Variables 


-1.0761 

1.1200 

1.2769 
0.6494 

1.2769 
0.9631 

-0.4486 
0.0220 
55 

92 
61 
24 
2.4 


-0.9192 
-1.0761 
-1.0761 
-1.0761 
-0.9192 
1.  6906 

1.2769 
31 
57 
55 
55 

1.07-1 


1.3095 
0.8506 

1.0101 


-0.0929 

-1.1850 

5 
4 
8 


1.7382 

1.1338 

0.9507 

0.8692 

2.3040 

-0.0769 

-1.0644 

-0.1563 

-0.0549 

-0.6256 

0.2710 

-0.^564 

-0.6670 


3 

3 


l.rn? 


• 3 


-1.1829 

-1.092*5 

-0.0126 

-0.0576 

-0.0576 

-0.8228 

-0.1926 

-1,0028 

0.48.75 

-0.2827 

0.2124 

-1.4529 

0.2575 

4.3804 

-0.0126 

0.3025 

-0.1926 

1.2477 


3777 
4775 


1.1957 
-1.1437 
-0.1  c4o 
-0.3679 
-0.3639^ 
-0.1040 
-0.104c 
0.9358 
-0.3639 
0.1560 
-0.6239 
3.0153 
-0.6239 
-O.8838 
-0.6239 
-0.8838 
-0.6239 
-O.8838 
-0.6239 
-0.6239 
1.9755 


-0,6239 
0.1560 
-0.6289 
-o.lc4o 
1.1957 
-0.3639 
-0.3639 
-0.3639 
-0.3639 
-0.6239 
-0.8838 
0.5758 
-0.3639 
-0.3639 
-0.67 3n 
3.0153 
0.1 CIO 
-0.1040 


0.1330 
5698 
3929 
9238 
1380 
1380 
1380 
5698 
21 


TABLE  IV  (Continued) 


Variables 

Obs. 

6 

7 

8 

9 

10 

1 

-0.8123 

-0.6847 

-0.7165 

-0.3559 

-0.3736 

2 

3.6807 

2.3281 

2.5094 

0.2181 

0.9608 

3 

0.1761 

-0.4108 

-0.2071 

-0.3559 

-0.3736 

4 

-0.6326 

-0.6847 

-0.3769 

-0.3559 

-0.3736 

5 

O.O863 

2.6020 

1.6605 

1.3660 

-0.3736 

6 

-0.9921 

-0.9686 

-0.8 862 

-0.3559 

-0.3736 

2 

-0.0036 

-0.4108 

-0.5467 

-0.3559 

-0.3736 

8 

-0.2732 

-0.4108 

-0.7165 

-0.3559 

-0.3736 

9 

. 0.2660 

0.1370 

-0.0373 

-0.3559 

-0.3736 

10 

-0.3630 

-0.9586 

-0.886? 

-0.3559 

-0.3736 

11 

-0.1833 

*.4109 

0.6418 

-0.3559 

-0.3736 

12 

-1.1718 

-O.0586 

-0.7165 

-0.3559 

-0.3736 

13 

0.1761 

0.6848 

0.4720 

1,3660 

0.9608 

14 

3.2314 

1.7803 

2.3396 

-0,3559 

-0.3736 

15 

-0.0036 

1.2325 

0.9S13 

-0.3559 

-0.3736 

16 

0.6254 

0.9586 

1.3209 

-0,3^9 

-0.3736 

17 

-0.2732 

0.9586 

0.4720 

-0.3559 

-C.3736 

18 

-0.0934 

1.2325 

0.8116 

-0.35^9 

-0.3736 

19 

0.5356 

-0.1369 

-0.2071 

-0.3559 

-0.3736 

20 

-c.3630 

0.1370 

0,1324 

-0.3559 

-0.3736 

21 

-1.1718 

-1.2325 

-1.3956 

-0.3559 

-0.3736 

22 

-0.6326 

-0.6647 

-0.7165 

0.2181 

0.9603 

: 23 

0.1761 

O.1970 

1.3209 

-0.3559 

-0.3736 

1 24 

0.0863 

-0.4108 

-0.2071 

-o.3?59 

-0.3736 

25 

-0.0036 

1.2325 

1.3209 

1.3660 

0.9608 

26 

-0.8123 

-1.2325 

-1,2258 

-0,3^Q 

-0.3736 

27 

0.0863 

-0,6847 

-0.3769 

-0.3559 

-0.3736 

28 

0.2660 

0.9586 

1.8302 

-0.3559 

-0.3736 

29 

-0.2732 

-0.4108 

-rV;467 

-0.35^9 

-0.3736 

! 30 

-0.°022 

-1.2325 

-1.2253 

-0.3*59 

-0.3736 

31 

-0.2732 

-0,0586 

-0.8862 

-0.3559 

-0.3736 

32 

-0. 1*529 

0.6*48 

0.4720 

-0.3559 

-0,3736 

33 

-0.1-33 

0.6848 

0.3022 

-O.3550 

-0.3736 

34 

0.2660 

-0.05% 

-0.8862 

-0.3559 

-0.3736 

35 

0,9043 

0.6848 

0.9^13 

0.2181 

-0.3736 

36 

-1.0919 

-0.6847 

-0.8862 

-0.3559 

-0.3736 

37 

-0.00 22 

-0.4108 

-0,«467 

-0.3559 

-0.3736 

38 

-1.2616 

-1.5064 

-1.3956 

-0.3557 

-0.3736 

0.7153 

0.9586 

0,641? 

-0.3559 

-0/3736 

4b 

0.0863 

-o.4io3 

-0.2071 

-0.3559 

-0.3736 

56 
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TABLE  IV  (Continued) 


Variables 


Obs. 

1 

2 

3 

4 

5 

41 

-0.2918 

0.3704 

1.7878 

-0.8838 

2.6156 

42 

-0.4486 

-0.6060 

-1.0028 

2.4954 

-1.4547 

ft 

-0.7624 

-0.6431 

0.0774 

-0.1040 

-0.9238 

44 

-0.7624 

-0.9346 

-1.1829 

1.1957 

-0.5698 

ft 

-0.9192 

-0.797 9 

0.4375 

-o.io4o 

-0. 5698 

46 

-0.7624 

-0,2285 

: 1.1127 

-0.8838 

0.6689 

ft 

1.1200 

1.3633 

-0.2377 

-0.3639 

1.1998 

48 

1.2769 

-1.0505 

-1.0478 

1.9755 

0.6689 

49 

1.2769 

1.8078 

2.0579 

-0.8838 

1.3768 

50 

1.4337 

1.9933 

-0.5977 

-0.1040 

-0.0389 

TABLE  IV  (Continued) 


Variables 


Obs. 

6 

7 

8 

9 

10 

41 

2.1531 

1.7803 

1.4907 

5.3837 

3.6296 

42 

-1.44l4 

-1.5064 

-1.5654 

-0.3559 

-0.3736 

43 

-1.0819 

-1.5064 

-1.5654 

-0.3559 

-0.3746 

44 

45 

-0.6326 

-0.9586 

-0.8862 

-0.3559 

-0.3736 

-0.7225 

0.1370 

-0.2071 

-0.3559 

-0.3736 

46 

0.2660 

-0.1369 

-0.2071 

0.7921 

3.6296 

47 

1.3443 

0.6848 

0.6418 

-0.3559 

-0.3736 

48 

0.6254 

-0.1369 

-O.0373 

-0.3559 

-0.3736 

49 

1.1646 

0.4109 

O.I324 

3.0879 

3.6296 

50 

-O.OO36 

-0.1369 

-0.2701 

0.2181 

0.9608 
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TABLE  V - Accident  Group  Standardized  Data 


Variables 

Obs. 

1 

2 

3 

4 

5 

1 

-0.9374 

-1.0456 

o.i4i3 

■^074195" 

-0.8172 

2 

-0.0531 

0.4047 

1.7109 

-0.4195 

0.0621 

% 

-0.2299 

0.0300 

0.5035 

1.8479 

-0.5069 

l 

-0.5836 

-0.6384 

0.4431 

-0.4195 

-0.9724 

5 

-0.9374 

-I.0329 

-0.2813 

-0.4195 

-0.5793 

6 

-0.9374 

-1.2480 

0.1413 

-0.41Q5 

-0.6103 

7 

1.0081 

0.9032 

-0.8548 

-0.9863 

• 0.8379 

8 

-0.5P36 

-0.4772 

0.0309 

-0.4195 

-0.9724  > 

9 

2.7767 

1.4009 

-1.7000 

-0.9863 

-0.4034  ' 

10 

0.1238 

O.8238 

0.9261 

1.2810 

-0.9724 

11 

-0.2299 

0.0014 

0.5639 

-0.9863 

0.9414  1 

12 

-0.2299 

-0.8479 

1.0166 

-0.4195 

0.4241  i 

13 

0.1238 

-0.2065 

0.1715 

-0.4195 

0.7562 

14 

-0.9374 

-1.0456 

-0.5832 

-0.9863 

-0.1965 

15 

1.0081 

1.1762 

1.8014 

0.7142 

1.4586 

16 

-0.9374 

-0.6376 

1.4090 

-0.9863 

1.2517 

17 

-0.5836 

-0.4606 

0.9264 

2.9815 

-0.9724 

18 

-0.9374 

-1.0837 

-O.2813 

-0.4195 

-0.4552 

19 

-0.5836 

-0.5058 

1.7713 

-0.4195 

0.0103 

20 

-0.7605 

-0.8281 

0.4733 

0.7142 

-0.9724 

21 

2.0693 

1.8089 

0.1111 

0.7142 

-0.9724 

22 

-0.4068 

-O.4590 

-0.9756 

0.7142 

3.8378 

23 

0.4775 

0.7428 

-1.0058 

*-0.4195 

-0.0931 

24 

-0.5836 

-0.4518 

0.3827 

-0.4195 

0.7862 

25 

-0.0531 

0.0316 

-0.4926 

0.1474 

-0.9724 

26 

-0.9374 

-1.0043 

0.5940 

-0.4195 

0.8379 

27 

-0.4068 

-0,3320 

-0.1304 

0.1474 

-0.0414  i 

28 

-0.5836 

-0.4709 

1.1374 

0.7142 

-0.9724 

29 

1.0081 

1.7549 

-1.3982 

0.7142 

-0.9724 

30 

2.0639 

1.8406 

-0,0097 

3.5484 

-0.9724 

31 

-0.2299 

-0.0652 

-1.4284 

-0.9863 

O.II83 

32 

1.0081 

1.2405 

-I.6095 

-0.4195 

-0.3000 

33 

2.0693 

2.1740 

0.3827 

0.7142 

-0.0724 

34 

-0.2299 

-0,1010 

2.5862 

-0.4195 

0.7345 

35 

-1.1142 

-1.1059 

-1.4284 

-0.4195 

-0.4552 

36 

-0.5836 

-0.9090 

0.4733 

0.7142 

-0.9724 

37 

-0.2299 

-0.6796 

-0.9152 

-0.4195 

1.2000 

36 

1.0081 

1.28*0 

0.2620 

-0.4x95 

0.7345 

39 

1.00*1 

1.6271 

1.0770 

-0,4195 

2.4413 

40 

-o.*?36 

-0,6074 

-0.7341 

2.4147 

-0.9724 

4l 

-0.5936 

-0,2613 

-1.247? 

-0.4X95 

-0.45*2 

1 42 

-0.2299 

-0,210* 

0.745O 

-0.4195 

1.20°0 

43 

2.0603 

2.9033 

c.im 

-0.4195 

-0.0031  1 

44 

-O.^.ioo 

-0.1764 

-0.B540 

-0,4105 

0.2307 

1 45 

-O.7605 

-O.903* 

0.2^16 

0.1474 

-0.9724 

46 

-0.2n0O 

-0.1391 

-l.*793 

-0.9863 

0.73l?  i 

4? 

2.O603 

0.9659 

-1.4? "4 

-0.4X9* 

—0.6620 

48 

-o.or/4 

-0 .050R 

-0,5228 

0.7142 

-0.9724 

! 49 

-0.5996 

-0.7945 

-O.0863 

0.0-21 

; 50 

-0.9374 

-1.1757 

0.0507 

-0.9863 

0.3207 
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TABLE  V (Continued) 


Variables 

Obs. 

6 

' 7 

8 

9 

10 

1 

-0.5903 

-0.1140 

-0.5100 

0.3115 

-0.3842 

2 

-0.3252 

-0.1140 

-0.5100 

-0.3965 

-0,3842 

3 

-0.8113 

-0.1140 

-0.5100 

-0.9628 

-0.6797 

4 

1.0883 

-1.2536 

0.9901 

-0.8212 

-0.6797 

5 

1.4^23 

1.0256 

1.7401 

-0.9628 

-0.6797 

6 

-O.8907 

-0.1140 

-0.5100 

-0.9628 

-0.6797 

7 

0.9120 

1.0256 

0.9901 

-0.1133 

0.2069 

8 

-o,98«0 

-1.2536 

-0.5100 

0.5947 

-0.0887 

9 

0.3376 

-0.1140 

0.2400 

-0.9628 

-0.6797 

10 

-1.2090 

-1.2536 

-1.2601 

1.5858 

-0.6797 

11 

1.2213 

1.0256 

1.7401 

1.444-2 

1.3890 

12 

0.1608 

-0.1140 

0.9901 

4.2761 

4.0487 

13 

1.1772 

1.0256 

0.9901 

1.1611 

-0.6797 

14 

0.1167 

-0.1140 

0.2400 

1.7274 

1.6845 

15 

0.8678 

-0.1140 

-0.5100 

-0.3965 

1.0935 

16 

1.7958 

1.0256 

0.9901 

0.7363 

0.7979 

17 

-1.2090 

-1.2536 

-1.2601 

1.3026 

2.8666 

18 

-0.7671 

1.0256 

0.9901 

-0.1133 

-0.6797 

19 

-0.3694 

-0.1140 

-0.5100 

0.0283 

0.5024 

20 

-0.7671 

-1.2536 

-0.5100 

-C.1133 

-0.0887 

21 

-0.6345 

-o.il4o 

-0.5100 

-O.II33 

-0.6797 

22 

2.9005 

-0.1340 

-0.5100 

-0.9628 

-0.6797 

23 

0.2050 

1.0256 

1.7401 

-0.9628 

-0.6797 

24 

1.2655 

1.0256 

0.9901 

0.4531 

0.7979 

25 

-0.3252 

-1.2536 

-0.5100 

-0.2549 

-0.6797 

26 

0.5535 

1.0256 

0.9901 

0,4531 

-0.3807 

27 

0.0725 

-0.1140 

0.2400 

0.5947 

-0.6797 

28 

-1.2090 

-1.2536 

-1.2601 

0.3115 

1.093? 

29 

-1.2090 

-1.2536 

-1.2601 

-0.9628 

-0.6797 

30 

-1.2000 

-1.2536 

-1.2601 

-0.962p 

-0.6797 

31 

0.4260 

0.24C0 

-0.9628 

-0.6797 

32 

-0,6345 

-o.ii4e 

-0,5100 

-0,9628 

-0.6797 

33 

-1.200Q 

-1.2536 

-1.2601 

0.3115 

0.5024 

34 

1.7516 

2.1652 

3.2402 

-0.0628 

-0.6797 

35 

-0.7671 

-o,n40 

OOP 

-0,0628 

-0.6797 

36 

-1.2090 

-1.2536 

-1.2601 

-0.2549 

-0.3842 

37 

0.6469 

1,0256 

0.24 00 

-0.8212 

-0,6797 

38 

0.2492 

-0.1140 

-0.5100 

0.7363 

0.5024  ! 

39 

1.7074 

2.1653 

0.9901 

0.^28} 

0.5024 

40 

-1.2090 

01.2562 

-1.2601 

O.P779 

1*0935  « 

41 

-0.7671 

-0.37,40 

-0.5100 

-0.1133 

-0.3042  ; 

42 

0.6460 

1.0256 

o.2lt00 

0.4531 

-o.3?42 

4* 

-0.4 "78 

-0.714^ 

-0.5lro 

0.5947 

1.0035 

44 

-0.1r43 

-0.3140 

-o.5iro 

-0,062.3 

-O.6797 

45 

-O.0159 

-1.2536 

-0.510^ 

0.4531 

1.0935 

46 

0.2492 

2.1653 

0.9901 

-0.962$ 

-0.6797 

4? 

-O.0438 

-0,H9r 

-o.962/1 

-0.6?n? 

48 

-1.2090 

-1.2r36 

-1.2601 

O.0047 

-0,  638? 

: 

O.7705 

-o.ilUo 

0.2400 

-O.H33 

-0. >797 

50 

- 0.4702 

1.0256 

O.o^oi 

-O.062  51 

-0.6797 
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APPEiDIX  C 
DISPERSION  MATRICES 

An  element  of  the  group  dispersion  matrix  is  given  by 
the  formula 


1 50 

IT^T  kf1  (xlk  “ xi)(xjk  ” XJ) 

where  i a 1,2 ,...,10  and  j = 1,2,..., 10.  An  element  of 
the  dispersion  matrix  is  readily  seen  to  differ  from  an 
element  of  the  covariance  matrix  only  by  the  factor 
where  N is  the  number  of  data  units  or  observations. 

An  element  of  the  pooled  within  groups  dispersion  matrix 
is  given  by  the  formula 

1 2 50 

&1  + W2  - * tlx  klx  Uikl  ~ *il)(xjkl  “ XJ1} 

where  i *»  1,2,..., 10  and  j ■ 1,2,..., 10,  and  N1  and  N2 
are  the  number  of  observations  of  the  respective  groups. 
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TABLE  VI  - Control  Group  Dispersion  Matrix 


Variable 

1 

2 

3 

4 

5 

1 

~ 41.47 

7700.41 

-25.90 

0.71 

0.58 

2 

7700.41 

2325433.00 

827.55 

-608.69 

95.87 

3 

-25.90 

827.55 

503.67 

-52.11 

4.87 

4 

0.71 

-60&.69 

-52.11 

15.10 

-0.95 

Variable  5 

0.58 

95.87 

4.87 

-0.95 

0.33 

6 

1.21 

249.13 

12.76 

-2.00 

0.61 

7 

0.05 

-9.39 

3.48 

-0.84 

0.14 

8 

0.29 

24.17 

5.74 

-1.30 

0.23 

9 

0.78 

538.38 

14.62 

-1.76 

0.48 

10 

0.22 

235.05 

6.04 

-O.83 

0.21 

Variable 

6 

7 

8 

9 

10 

1 

1.21 

0.05 

0.29 

0.78 

0.22 

2 

249.13 

-9.39 

24.17 

538.38 

235.05 

3 

12.76 

3.48 

5.74 

14.62 

6.04 

4 

>—2.00 

->0.84 

-i.30 

-1.76 

•H  ^ M 

— 0*Oj> 

Variable  5 

0.61 

0.14 

0.23 

0.48 

0.21 

6 

1.26 

0.30 

O.52 

0.77 

0.32 

7 

0.30 

0.14 

0.21 

0.27 

0.07 

8 

0.52 

0,21 

0.35 

0.34 

0.09 

9 

0.77 

0.2? 

0.34 

3.10 

1.11 

10 

0.32 

0.07 

0.09 

1.11 

0.57 
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TABLE  VII  - Accident  Group  Dispersion  Matrix 


Variable 


Variable 


Variable 


1 

2 

2 

5 


9 

10 


1 

2 

3 

4 

5 

"*32.62 

6776.32 

-31.67 

1.43 

-0.45 

6776.32 

1619479.00 

-2719.64 

420.05 

-9.87 

-31.67 

-2719.64 

1119.93 

9.76 

7.69 

1.43 

420.05 

9.76 

3.18 

-1.23 

-0.45 

-9.87 

7.69 

-1.23 

3.31 

-1.46 

-322.23 

11.15 

-1.97 

3.82 

-O.32 

-66.75 

0.93 

-0.93 

1.17 

-1.33 

-346.63 

6.47 

-1.43 

1.29 

-6.55 

-1304.87 

86.96 

0.60 

-0.37 

-1.42 

-269.94 

44.53 

0.86 

0.29 

Variable 


6 

7 

8 

9 

10 

1 

-1.46 

-0*32 

-1.33 

-6.55 

-1.42 

2 

-322.23 

-66.75 

-346 .63-1304. 87 

-269. °4 

3 

11.15 

0.93 

6.4? 

86.96 

44.58 

4 

-1.97 

-0.93 

-1.43 

0.60 

0.86 

5 

3.87 

1.17 

1.29 

-0.37 

0.29 

6 

5.23 

1.35 

2.24 

-0.85 

—0.10 

7 

1.35 

0.79 

0.97 

-0.96 

—0.44 

8 

2.24 

0.97 

1.81 

-0.13 

-0.19 

9 

-0.85 

-0.96 

-0.13 

50.90 

19.04 

10 

-0.10 

—0.44 

-0.19 

19.04 

11,68 

mm 
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TABLE  VIII  — Poo;:  3d  Within  Groups  Dispersion  Matrix 


Variable 


Variable 


! 


i 


1 

2 

i 

5 

6 


9 

10 


1 

2 

2 

5 

6 

7 

8 

9 

10 


Variable 

1 

2 

3 

•4 

r 37.05 

7238.36 

-28.78 

1.07 

7238.36 

1972456.00 

-946.04 

-94.32 

-28.78 

-946.04 

811.80 

-21.18 

1.07 

-94.32 

-21.18 

9.14 

0.06 

4-3.00 

6.28 

—1.09 

-0.13 

-36.55 

11.96 

-1.98 

-0.13 

-38,07 

2.20 

-0.89 

-0.52 

-161.23 

6.11 

-1.37 

-2.88 

-383.24 

50.79 

-0.58 

-0.60 

-17.44 

25.31 

0.01 

Variable 

6 

7 

8 

9 

-0.13 

-0.13 

-0.52 

-2.88 

-36.55 

-38.07 

-161.23 

-383.24 

11.96 

2.20 

6.11 

50.79 

-1.98 

-O.69 

-1.37 

-0.53 

2.21 

0.66 

0.76 

0.05 

3.24 

0.83 

1.38 

-0.04 

0,83 

0.46 

0.59 

-0.35 

1.38 

0.59 

1.08 

0.11 

-0.04 

-0.35 

0.11 

27.00 

0.11 

-0.18 

-0.05 

10.07 

*> 


0.06 

4*3.00 

6.28 

-1.09 

2.07 

2.21 

0.66 

0.76 

0.05 

0.25 


10 

-0.60 

-17.44* 

25-31 

0.01 

0„25 

0.11 

-0.18 

-0.05 

10.07 

6.13 


APPENDIX  D 

STATISTICAL  TEST  FOR  EQUALITY  OF  DISPERSION  MATRICES 

Given  a sample  of  two  groups  and  ra  variables  with  group 
dispersion  matrices  S^  and  Sg,  pooled  within  groups  disper- 
sion matrix  SW4  and  total  sample  observations  N = + N^, 

the  hypothesis  that  the  dispersion  matrices  (and  thus  the 

covariance  matrices)  are  statistically  equal  may  be  deter- 

2? 

mined  by  the  following  computations: 

A o InC.ISj]  • (N-2)  - (N1-l)-ln[|S1|]  - (N2-l) *ln[ |S2 1] 


B * 


+ - ^2  3 • (2m2  - 3m  - 1) 

6(2  - 1) (m  * 1) 


l **  e v 


c a 


(N^lT  (N'2+l)c  (N-2) 


» y]  * (tn  - l)(m  2) 


m(m  + 1) 


E = 


P t 5 

abs  |E*  - Cl 


23 

Box,  G.E.P.,  MA  General  Distribution  Theory  for  a Class 
of  Likelihood  Criteria,”  Riometrika  36  (19^9),  p.  317-3^6 
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APPENDIX  E 


Figures  (10)  through  (19)  depict  cluster  analysis  plots 
for  each  of  the  ten  variables  being  studied. . The  label  "Pw 
on  the  vertical  axis  of  each  figure  represents  the  proportion 
of  each  cluster  that  originates  from  the  accident  group. 


P 


i.o*- 

0„8- 


0*6  T 


0.4 


0.2 


0.0- 


, Cq  r 


•C, 


*ca  *c 


1 


0 loot  2000.  30C0 


4000  ?000  6000 


Total  Hours 
(Cluster  Means) 


Figure  (11) 
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c,  Hours  Flovn 

-+ — 1 — — ' — * — • — k~ < — ♦ — *—  Last  90  Days 
15  30  45  60  75  90  (Cluster  Means) 

Figure  (12) 


^7 


g,  Days  Since 

-♦ — * — i— < — t — •—*— — ♦ — • — Last  Flight 
3,23456  (Cluster  Means) 


Figure  (13) 


*r» 


c.  Time  Flown  in 

UH — ' — 1 — 1 » — i — i — i — i — Last  24  Hours 

*8  1,2  1,6  2,0  2.4  (Cluster  Means) 

Figure  (14) 


t*  Ct  _ 
3 * • 

5 • * i 

s s 


c, 

-fa—4 — 4 1- 


1.2  1.8  2.4  3.0  3.6 


Time  Flown  in 
Last  48  Hours 
(Cluster  Means) 


Figure  (15) 


No,  Missions 

■— -+ — * — *— i xast  24  Hours 

1.2  1.5  1.8  (Cluster  Means) 

Figure  (16) 


Ho.  Missions 

—4 — « — *■ — h — t— *-  last  48  Hours 

2 1.6  2.0  2.4  (Cluster  Means) 

Figure  (17) 


Carrier  Lndgs. 

-i — i— — t — * — < — last  30  Days 
8 10  12  (Cluster  means) 

Figure  (18) 


*S 


Night  Car.  Lndgs 
■t — *— h t — — f — Last  3D  Days 
456  (Cluster  Keans) 


Figure  (19) 


APPENDIX  P 


CORRELATION  MATRIX 


The  product-moment  correlation  between  two  variables, 

Cov(X1,Xj 

X,  and  X, , Is  given  by  p, , = ^ . An  element 

1 J V£?apVarTx~ 

of  the  correlation  matrix  can  be  calculated  by  the  equation 

n 


Z 

k=l 


C L (X^-X^2  I (Xjk-Jj)2 


1/2  • 

] 


TABLE  VIII  — Correlation  Matrix  for  all  Data 


Variable 

1 

2 

3 

4 

5 

1 

1.0000 

0.8494 

-0.2030 

0.1161 

-0.0378 

2 

0.S484 

1.0000 

-0.0961 

0.0539 

-0.0288 

3 

-0.2080 

-0.0961 

1.0000 

-0.4616 

0.3111 

4 

0.1161 

0.0539 

—0.4616 

1.0000 

-0.3818 

Variable  5 

-0.0378 

-0.0238 

0.3111 

-0.3318 

1.0000 

6 

-0.049? 

-0.0553 

0.3516 

-0.4539 

0.8682 

7 

-0.0710 

-O.oSl? 

0.2644 

-0.5147 

0.7092 

8 

-0.1163 

-0,1464 

O.3308 

-0.5114 

0.5596 

9 

-0.1436 

-0.1164 

0.5284 

-0.2957 

0.1851 

10 

-0.08*J2 

-0.0577 

0.4864 

-0.1973 

0.1932 

6 

7 

8 

9 

10 

1 -0.0407 

-0.071.0 

-0.1163 

-0.1436 

-0.0852 

2 

-0.0^8 

-0.0817 

-0.1464 

-0.1164 

-0.0577 

3 

0.3516 

0.2644 

0.3303 

0.5284 

0*4864 

4 

-0.4*29 

-0.5147 

-0.5114 

-0.2^57 

-*.1973 

Variablo  5 

0. -6-2 

0.7092 

0,5*06 

0.1351 

0.193" 

6 

l.orro 

0.7053 

0.7591 

0.l5?-u 

0.1370 

7 

0.7058 

0.8499 

0,0?36 

0.0351, 

8 

0.7591 

0.n4«0 

l.o-oo 

0.1726 

0.0994 

Q 

0.1*20 

0.0336 

0.1726 

1.0000 

0.8166 

10 

0.1370 

0.0251 

0.0994 

0.8169 

1.0000 
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APPENDIX  G 

TABLE  IX  - Principal  Components  for  Control  Group* 


I Variables  I Variables 
Obs.l  1 * 2 ! 5, 6, 7,&  8 


1.6699 
-1.3794 
-0.1867 
-1.7:50 
-l.?845 
-1.3456 
0.5147 
0.2131 
0.6904 
1.174-° 
1.5850 
0.  P.035 
1.1415 
0.7133 
1.5827 
lc6013 
1.6467 

1.4937 
-2.0501 
-1.7224  ! 
-1.33^6 
-0.4-34 
-3.6046 
.»8>«4636 
1.4983 


1.3798 
6.3810 
0.1488 
1194 
9568 


3724 

3248 

8672 

8991 

5973 

9013 

5121 


-0.2406 


Variable: 
Ofcs.  1 & 2 


0.5332 
0.1328 
0.8624 
0 .234*1 


0.2469 

0.0073 


The  principal  components  were  extracted  from 
standardised  data. 
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TABLE  x — Principal  Components  for  Accident  Group* 


Variables  Variables 
Obs.  1 & 2 5,6, 7, & 8 


Variables  Variables 

1 & 2 5,6, 7, & 8 


3.3880 

-0.2461 

0.1399 

0.8553 

1.3791 

1.5297 

-1.3378 

0.7425 

-2.9242 

-0.6633 

0.1599 

0.7544 

0.0578 

1.3880 

-I.5289 

1.1024 

0.7309 

1.4147 

0.7825 

1.1120 

-2.7147 

0.6060 

-0.8542 

0.7247 

0.0150 


-0.9997 

-0,44o6 

-0.9650 

-0.0562 

2.3743 

-I.C603 

1.8637 

-1.8509 

0.0391 

-2.2258 

2.4339 

0.7056 

1.9759 

0.0242 

0.8463 

2.5191 

-2.3258 

0.3751 

-0.4883 

-1.7367 

-1.0970 

3.0390 

1.4120 

2.0216 

-1.5083 


1.3591 
0.5521 
0.7381 
-1.93luO 
-2.7369 ' 
0.2065 
-1.57bo 
-2.9702 
0.2316 

1.5540 

1.0448 

0.6366 

-1.6751 

-1.8446 

0.8336 

0.5914 

O.3082 

-3.1236 

0.2844 

1.1647 

0.2932 

-2.1246 

1.3280 

0.7825 

1.4791 


1.6910 
. 0.0758 
-2.3258 
-2.3258 
-2.3258 
0.9042 
-0.7743 
-2.3258 
3. 9012 
-C.9173 
-2.32^8 
1.5398 
0.1790 
3.6157 
-2.3258 
-0.9173 
1.5398 
-O.5836 
-0.2023 
-1.3485 
2.0U27 
-I.I079 

-2.32^3 

0.4909 

1.3871 


•Note:  The  principal  components  were  extracted  from 

standardised  O.ata. 
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c, 


Experience 
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Figure  (20) 


• c« 


Hours  Last 

k — 1 — f— “ — t—  90  Days 

) 1 2 3 (Cluster  Means) 

Figure  (21) 
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Figure  (22) 
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Figure  (23) 
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